Design - Hashtags

Sunday, September 10, 2023

7:51 AM

OneNote has a rudimentary tagging system that lets users apply one or more "tags" to a paragraph. Each tag is indicated by a unique icon. Users can customize the name, which shows up as a tooltip over the icon. Users can then list tags within a specified scope. However, the Find Tags function is merely a list of tags within the scope; there is no way to search for a specific tag icon or name. Also, the tag itself doesn't provide any context although the Find Tags feature pulls about the first 50 characters from the related paragraph. Many users have complained about the usefulness of this feature and its implementation.

 

OneMore Page Tags adds a page-level tagging capability where the user can add one or more tag keywords to a special content box (one:Outline) that appears below the title line, next to the date line. The advantage of this feature is that users can quickly search for one or more tags and navigate through a tree of pages annotated with those tags. The disadvantages are that these page-level tags do not provide a useful context related to the content of the page and the use of a specially positioned container at the top of the page is fragile to unintended user manipulation, not to mention needing to treat this container exceptionally by OneMore commands intended to process or manage containers in general.

 

OneMore Hashtags is a new feature that provides a more traditional tagging feature using hashtag-keywords within the text of a page. Typing a hashtag within the flow of content is a more natural and intuitive approach. This also lets users search for hashtags within a page, across pages in a section, or even across notebooks, while also providing textual context for each hashtag. Future enhancements could include the ability to highlight specific hashtags.

 

Design

A user can add one or more hashtags to their content, embedded within or in place of any text on a page.

A hashtag may start with either one or two hashtag numbers symbols, such as #hashtag or ##hashtag. Tags may contain letters, digits, hyphens, and underscores. Any other character will terminate the hashtag.

 

Single hash hashtags must not begin with a number. #a123 is valid but #1abc is not. This is to ensure OneMore differentiates between a hashtag and a numbered sequence like #1, #2, #3, etc. If you want a tag such as #123 then use two number signs, such as ##123.

 

Enhancements may include the ability to exclude patterns important to programmers, such as HTML Hex colors, like #FFCC00, or C# or C++ pragma directives, like #include or #define.

 

Valid hashtags examples include

 

  • #hashtag
  • #hash-tag
  • #hash_tag_12
  • ##12345

 

Out of Scope

Nested tags such as those in Obsidian. These ostensibly emulate a categorization hierarchy. OneNote implements this hierarchy as a structure of notebooks, sections, and section groups. So nested tags would be redundant in OneNote.

 

Possible Functionality

  • Right-click a tag and find related pages
  • Tag navigator window
    • Search for tags, show pages per each tag
    • Click page - navigate to page
    • Click page - find "related" pages, other pages that have the same tags (some, all)
  • Build map of related tags - those that are mentioned on the same page, along with occurance counts, similar to a relative tag cloud

 

Scanning

Hashtags are discovered using a scanner class that enumerates notesbooks, sections, and pages. Each page has a lastModifiedTime attribute that we can use to compare against the time of the last scan to optimize each successive scan by skipping pages that haven't changed.

 

Hashtag Scanning

 

HashtagService is created upon OneNote startup as a low priority background thread. It scans all (unlocked) pages in all notebooks. It repeats this every two minutes.

 

The service uses HashtagScanner as the primary business logic, fabricating a HashtagPageScanner for each page. The page scanner discovers hashtags on the page and returns them as a collection to the scanner, which HashtagProvider to resolve (save, delete) hashtags for the page.

 

Hashtag Data Store

A number of alternatives were considered.

 

- Alternative 0 - Scan JIT, In-Memory

Scanning multiple pages on demand. This could be scope to the current page, current section, current notebook, or all notebooks.

 

Advantages

  • Simplicity

 

Disadvantages

  • Time consuming. Scanning about 1500 pages takes anywhere from 21 seconds to over a minute based on system load.
  • Not a realistic interactive experience

 

- Alternative 1 - Save to one:Meta

Create a top-level one:Meta entry on the page (name="omHashtags") making it discoverable using the onenote.FindMeta function. one:Meta max length is 262144 chars. Even if each hashtag is 25 characters, this leaves room for well over 10,000 hahstags on a single page (262144 / 25 = 10485.76)

 

The meta content could be of the form "##tag1,##tag2, … ,##tagn,"

 

  • Each tag is fully specified, including its double-pound prefix
  • Tags are delimeted by a comma
  • The last tag is also followed by a comma, making it easy to substring search for a complete hashtag name of the form "##name,"

 

There is no way to associate a tag with its paragraph. We could expand the scheme to include the paragraph object ID.

 

"ID1=##tag1,ID2=##tag2,ID2=##tag3, … ,IDn=##tagn,"

 

This still leaves room for well over 3,000 hashtags on a single page (262144 / 75 = 3495.2533)

 

Advantages

  • Takes advantage of well-established built-in features of the OneMore XML schema and meta searching capabilities.
  • No third-party packages are required.

 

Disadvantage

  • The FindMeta function does not search the content of each Meta; it only searches for the name. This means that searching must be done in two steps: discover all pages that have the omHashtags Meta element and then filter on the ones with the target hashtag in the content attribute. This may be slow and inefficient.
  • Increases base size of a page, making all OneMore features slower to save through the Interop onenote.UpdatePageContent API.
  • Must store last-scan-time someplace, perhaps in the OneMore settings file or a new file in the app data folder.

 

- Alternative 2 - Save to File.json

Serialize a collection of Hashtag models to JSON, either distributed by scope or including a scope property.

 

Advantages

  • Simple and cheap
  • Could work well for relatively smaller data sets

 

Disadvantages

  • Could become a performance bottleneck when the store grows over a certain size.
  • Entire store needs to be read for each query and rewritten for every modification.
  • No built-in searching capabilities, other than LINQ.Where() or similar filtering; not indexed, not performant.
  • Although a user may have less than a couple of hundred tags, those could be duplicated across dozens of pages, quickly multiplying contextual referencing making the stored model quite large and combersome. One solution may be to normalize data into referrenced models but then we're starting to reinvent a DBMS.

 

- Alternative 3 - Save to Sqlite

Record last scan date/time in a separate control table. Use this timestamp to compare against the lastModified timestamp of each page to know whether we need to rescan an updated page.

 

Record each hashtag on every page indvidiually. We can capture the hashtag and it's location on the page and when it was recorded. This provides contextual location of each hashtag when searching and displaying to user.

 

Hashtag ER Model

 

 

hashtag_scanner Table

  • Contains exactly one row
  • scannerID is 0.
  • version is used to know when to upgrade the schema, currently set at 1.
  • scanTime indicates the timestamp of the most recently compeleted scan. Used to compare against the lastModifiedTime of each page to know whether to scan its contents

 

hashtag Table

  • Each row indicates the existence of at least one ocrrance of a named hashtag in a specified paragraph
  • Paragraph is uniquely identified by its objectID (found on pageID)
  • snippet captures the context of the tag, including surrounding text
  • lastModified could be used to show the age of the hashtag - when it was first discovered

 

hashtag_page

  • Normalizes page references for multiple tags
  • notebookID and sectionID provides filtering capabilities
  • titleID is used to navigate to the top of a page when already on that page

 

Advantages

  • Solved problem; "quick to market!"
  • Can enhance, easier to evolve schema compared to a formatted string stored in a Meta element.

 

Disadvantage

  • Another nuget package and thing to maintain

 

 

Hashtag User Interface

 

- Alternative 0 - Build a new window

Adds complexity and noise, cognitive dissonance.

 

- Alternative 1 - Integrate with Navigator window

Introduce a tabbed interface to the window. The primary tab will display history navigation, secondary tab will display hashtag searching. Although users may not intuitively look here.

 

- Alternative 2 - Integrate with Search and Find windows

Both the Search and Copy/Move dialog and the Find Tagged Pages dialog are appropriately relevant to also finding hashtags. Although they are burried beneath the Search menu, there is a more intuitive correlation. An additional "Find Hashtags" command could be added that opens/prefers one of these dialogs.

 

 

══════════════════════════════════════════════════════════════════════════════════════════════════

SQLite

 

 

 

 

──────────────────────────────────────────────────────────────────────────────────────────────────

Hashtag Scanning PlantUML (Refresh)

@startuml Hashtag Scanning

skin rose

skinparam defaultFontSize 9

skinparam ParticipantPadding 20

skinparam BoxPadding 80

scale max 500 width

class HashtagScanner

together {

  class HashtagPageScanner

  class HashtagPageScannerFactory

}

class Hashtag

class HashtagProvider

HashtagService -[hidden] HashtagScanner

HashtagScanner - HashtagProvider : Uses >

HashtagScanner -- HashtagPageScanner : Uses >

HashtagPageScannerFactory - HashtagPageScanner : Creates >

HashtagPageScanner - Hashtag : Discovers >

@enduml

 

Hashtag ER Model PlantUML (Refresh)

@startuml Hashtag ER Model

skin rose

skinparam ParticipantPadding 20

skinparam BoxPadding 40

scale max 450 width

left to right direction

entity hashtag_scanner {

  * scannerID : number

  --

  * version : number

  * scanTime : text

}

entity hashtag {

  * tag : text

  * objectID : text

  --

  * moreID : text

  * snippet : text

  * lastModified : text

}

entity hashtag_page {

  * moreID : text

  * pageID : text

  --

  * titleID : text

  * notebookID : text

  * sectionID : text

  * path : text

  * name : text

}

hashtag_scanner -[hidden]- hashtag

hashtag ||--|{ hashtag_page : moreID

@enduml

 

 

#omwiki #omdeveloper #omdesign

 

© 2020 Steven M Cohn. All rights reserved.

Please consider a sponsorship or one-time donation to support ongoing development

 

Created with OneNote.